220 research outputs found
A Study on Agreement in PICO Span Annotations
In evidence-based medicine, relevance of medical literature is determined by
predefined relevance conditions. The conditions are defined based on PICO
elements, namely, Patient, Intervention, Comparator, and Outcome. Hence, PICO
annotations in medical literature are essential for automatic relevant document
filtering. However, defining boundaries of text spans for PICO elements is not
straightforward. In this paper, we study the agreement of PICO annotations made
by multiple human annotators, including both experts and non-experts.
Agreements are estimated by a standard span agreement (i.e., matching both
labels and boundaries of text spans), and two types of relaxed span agreement
(i.e., matching labels without guaranteeing matching boundaries of spans).
Based on the analysis, we report two observations: (i) Boundaries of PICO span
annotations by individual human annotators are very diverse. (ii) Despite the
disagreement in span boundaries, general areas of the span annotations are
broadly agreed by annotators. Our results suggest that applying a standard
agreement alone may undermine the agreement of PICO spans, and adopting both a
standard and a relaxed agreements is more suitable for PICO span evaluation.Comment: Accepted in SIGIR 2019 (Short paper
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
From Counter-intuitive Observations to a Fresh Look at Recommender System
Recently, a few papers report counter-intuitive observations made from
experiments on recommender system (RecSys). One observation is that users who
spend more time and users who have many interactions with a recommendation
system receive poorer recommendations. Another observation is that models
trained by using only the more recent parts of a dataset show significant
performance improvement. In this opinion paper, we interpret these
counter-intuitive observations from two perspectives. First, the observations
are made with respect to the global timeline of user-item interactions. Second,
the observations are considered counter-intuitive because they contradict our
expectation on a recommender: the more interactions a user has, the higher
chance that the recommender better learns the user preference. For the first
perspective, we discuss the importance of the global timeline by using the
simplest baseline Popularity as a starting point. We answer two questions: (i)
why the simplest model popularity is often ill-defined in academic research?
and (ii) why the popularity baseline is evaluated in this way? The questions
lead to a detailed discussion on the data leakage issue in many offline
evaluations. As the result, model accuracies reported in many academic papers
are less meaningful and incomparable. For the second perspective, we try to
answer two more questions: (i) why models trained by using only the more recent
parts of data demonstrate better performance? and (ii) why more interactions
from users lead to poorer recommendations? The key to both questions is user
preference modeling. We then propose to have a fresh look at RecSys. We discuss
how to conduct more practical offline evaluations and possible ways to
effectively model user preferences. The discussion and opinions in this paper
are on top-N recommendation only, not on rating prediction.Comment: 11 pages, 5 figure
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
Dataset vs Reality: Understanding Model Performance from the Perspective of Information Need
Deep learning technologies have brought us many models that outperform human
beings on a few benchmarks. An interesting question is: can these models well
solve real-world problems with similar settings (e.g., identical input/output)
to the benchmark datasets? We argue that a model is trained to answer the same
information need for which the training dataset is created. Although some
datasets may share high structural similarities, e.g., question-answer pairs
for the question answering (QA) task and image-caption pairs for the image
captioning (IC) task, they may represent different research tasks aiming for
answering different information needs. To support our argument, we use the QA
task and IC task as two case studies and compare their widely used benchmark
datasets. From the perspective of information need in the context of
information retrieval, we show the differences in the dataset creation
processes, and the differences in morphosyntactic properties between datasets.
The differences in these datasets can be attributed to the different
information needs of the specific research tasks. We encourage all researchers
to consider the information need the perspective of a research task before
utilizing a dataset to train a model. Likewise, while creating a dataset,
researchers may also incorporate the information need perspective as a factor
to determine the degree to which the dataset accurately reflects the research
task they intend to tackle.Comment: 19 pages, 5 figure
- …